1,798 research outputs found

    Joint Vertex Degrees in an Inhomogeneous Random Graph Model

    Full text link
    In a random graph, counts for the number of vertices with given degrees will typically be dependent. We show via a multivariate normal and a Poisson process approximation that, for graphs which have independent edges, with a possibly inhomogeneous distribution, only when the degrees are large can we reasonably approximate the joint counts as independent. The proofs are based on Stein's method and the Stein-Chen method with a new size-biased coupling for such inhomogeneous random graphs, and hence bounds on distributional distance are obtained. Finally we illustrate that apparent (pseudo-) power-law type behaviour can arise in such inhomogeneous networks despite not actually following a power-law degree distribution.Comment: 30 pages, 9 figure

    The practical use of the A* algorithm for exact multiple sequence alignment

    Get PDF
    Multiple alignment is an important problem in computational biology. It is well known that it can be solved exactly by a dynamic programming algorithm which in turn can be interpreted as a shortest path computation in a directed acyclic graph. The A\cal{A}^* algorithm (or goal directed unidirectional search) is a technique that speeds up the computation of a shortest path by transforming the edge lengths without losing the optimality of the shortest path. We implemented the A\cal{A}^* algorithm in a computer program similar to MSA~\cite{GupKecSch95} and FMA~\cite{ShiIma97}. We incorporated in this program new bounding strategies for both, lower and upper bounds and show that the A\cal{A}^* algorithm, together with our improvements, can speed up comput ations considerably. Additionally we show that the A\cal{A}^* algorithm together with a standard bounding technique is superior to the well known Carillo-Lipman bounding since it excludes more nodes from consideration

    Fast and accurate read mapping with approximate seeds and multiple backtracking

    Get PDF
    We present Masai, a read mapper representing the state-of-the-art in terms of speed and accuracy. Our tool is an order of magnitude faster than RazerS 3 and mrFAST, 2-4 times faster and more accurate than Bowtie 2 and BWA. The novelties of our read mapper are filtration with approximate seeds and a method for multiple backtracking. Approximate seeds, compared with exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds by taking advantage of the repetitiveness of next-generation sequencing data. Combined together, these two methods significantly speed up approximate search on genomic data sets. Masai is implemented in C++ using the SeqAn library. The source code is distributed under the BSD license and binaries for Linux, Mac OS X and Windows can be freely downloaded from http://www.seqan.de/projects/masai

    RazerS 3: Faster, fully sensitive read mapping

    Get PDF
    Motivation: During the last years NGS sequencing has become a key technology for many applications in the biomedical sciences. Throughput continues to increase and new protocols provide longer reads than currently available. In almost all applications, read mapping is a first step. Hence, it is crucial to have algorithms and implementations that perform fast, with high sensitivity, and are able to deal with long reads and a large absolute number of indels. Results: RazerS is a read mapping program with adjustable sensitivity based on counting q-grams. In this work we propose the successor RazerS 3 which now supports shared-memory parallelism, an additional seed-based filter with adjustable sensitivity, a much faster, banded version of the Myers’ bit-vector algorithm for verification, memory saving measures and support for the SAM output format. This leads to a much improved performance for mapping reads, in particular long reads with many errors. We extensively compare RazerS 3 with other popular read mappers and show that its results are often superior to them in terms of sensitivity while exhibiting practical and often competetive run times. In addition, RazerS 3 works without a precomputed index. Availability and Implementation: Source code and binaries are freely available for download at http://www.seqan.de/projects/razers. RazerS 3 is implemented in C++ and OpenMP under a GPL license using the SeqAn library and supports Linux, Mac OS X, and Windows

    Segment-based multiple sequence alignment

    Get PDF
    Motivation: Many multiple sequence alignment tools have been developed in the past, progressing either in speed or alignment accuracy. Given the importance and wide-spread use of alignment tools, progress in both categories is a contribution to the community and has driven research in the field so far. Results: We introduce a graph-based extension to the consistency-based, progressive alignment strategy. We apply the consistency notion to segments instead of single characters. The main problem we solve in this context is to define segments of the sequences in such a way that a graph-based alignment is possible. We implemented the algorithm using the SeqAn library and report results on amino acid and DNA sequences. The benefit of our approach is threefold: (1) sequences with conserved blocks can be rapidly aligned, (2) the implementation is conceptually easy, generic and fast and (3) the consistency idea can be extended to align multiple genomic sequences. Availability: The segment-based multiple sequence alignment tool can be downloaded from http://www.seqan.de/projects/msa.html. A novel version of T-Coffee interfaced with the tool is available from http://www.tcoffee.org. The usage of the tool is described in both documentations. Contact: [email protected]

    RazerS - Fast Read Mapping with Sensitivity Control

    Get PDF
    Second-generation sequencing technologies deliver DNA sequence data at unprecedented high throughput. Common to most biological applications is a mapping of the reads to an almost identical or highly similar reference genome. Due to the large amounts of data, efficient algorithms and implementations are crucial for this task. We present an efficient read mapping tool called RazerS. It allows the user to align sequencing reads of arbitrary length using either the Hamming distance or the edit distance. Our tool can work either lossless or with a user-defined loss rate at higher speeds. Given the loss rate, we present an approach that guarantees not to lose more reads than specified. This enables the user to adapt to the problem at hand and provides a seamless tradeoff between sensitivity and running time
    corecore